AITopics | panoptic segmentation

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling

Neural Information Processing SystemsJun-23-2026, 02:29:10 GMT

While visual autoregressive modeling (VAR) strategies have shed light on image generation with the autoregressive models, their potential for segmentation, a task that requires precise low-level spatial perception, remains unexplored. Inspired by the multi-scale modeling of classic Mask2Former-based models, we propose SegVAR, a novel framework that rethinks segmentation as a conditional autoregressive mask generation problem. This is achieved by replacing the discriminative learning with the latent learning process. Specifically, our method incorporates three core components: (1) an image encoder generating latent priors from input images, (2) a spatial-aware seglat (a latent expression of segmentation mask) encoder that maps segmentation masks into discrete latent tokens using a location-sensitive color mapping to distinguish instances, and (3) a decoder reconstructing masks from these latents. A multi-stage training strategy is introduced: first learning seglat representations via image-seglat joint training, then refining latent transformations, and finally aligning image-encoder-derived latents with seglat distributions. Experiments show Seg-VAR outperforms previous discriminative and generative methods on various segmentation tasks and validation benchmarks. By framing segmentation as a sequential hierarchical prediction task, Seg-VAR opens new avenues for integrating autoregressive reasoning into spatial-aware vision systems.

large language model, machine learning, segmentation, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

e8e30fda5ab87ea93360a36288ac0145-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 03:55:03 GMT

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
(2 more...)

Add feedback

d4eed238cf5807c6b75face996302892-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:50:09 GMT

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

K Net Towards Unified Image Segmentation

Neural Information Processing SystemsApr-25-2026, 23:43:32 GMT

Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published stateof-the-art single-model results of panoptic segmentation on MSCOCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MSCOCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/ZwwWayne/K-Net/.

machine learning, natural language, segmentation, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
(2 more...)

Add feedback

ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation

Neural Information Processing SystemsFeb-17-2026, 17:48:31 GMT

This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to the high complexity in the training objective of panoptic segmentation, it will inevitably lead to much higher penalization on false positive.

artificial intelligence, machine learning, segmentation, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

d4eed238cf5807c6b75face996302892-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 07:57:21 GMT

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

c9ef471a579197c4ed99df2aa542ce97-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 02:22:36 GMT

artificial intelligence, machine learning, segmentation, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

c9ef471a579197c4ed99df2aa542ce97-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 02:22:27 GMT

machine learning, natural language, segmentation, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Supplementary Material 1 Additional Implementation Details

Neural Information Processing SystemsFeb-15-2026, 10:27:56 GMT

We printed a checkerboard with a 9x10 grid of blocks, each measuring 87 mm x 87 mm. Parameter V alue Model Architecture Panoptic-PolarNet Test Batch Size 2 V al Batch Size 2 Test Batch size 1 post proc threshold 0.1 post proc nms kernel 5 post proc top k 100 center loss MSE offset loss L1 center loss weight 100 offset loss weight 10 enable SAP True SAP start epoch 30 SAP rate 0.01 Table 3: Parameters for Panoptic Segmentation model Parameter V alue(s) Model Architecture 4D-StOP Learning Rate 0.0005 Momentum 0.98 Stride 1 Max in points 5000 Sampling importance Decay Sampling None Input Threads 16 Checkpoint Gap 100 Table 4: Parameters for the 4D Panoptic Segmentation model The results reveal a significant variance in performance across different categories. Notably, 'Structure' and'Ground' both achieved high mIoU at Result The results are shown in Table 8. presents the mean intersection-over-union (mIoU) percent-56 Notably, 'Structure' achieved the highest mIoU at'General Objects' category have the lowest mIoU, The dataset is divided into 17 and 6 categories, respectively. Ground' and'Roads', as opposed to grouping anything related to ground as a single category. Overall, the performance across these tasks underscores the challenges posed by our dataset's With our dataset, future work can focus on improving the model's capacity to handle such diverse The raw data, processed data, and framework code can be found on our website.

category, machine learning, object-oriented architecture, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Technology: